OcrV1, Main, Exploration, bibRecord, 000A27

A Novel Approach for Word Spotting Using Merge-Split Edit Distance

Identifieur interne : 000A27 ( Main/Exploration ); précédent : 000A26; suivant : 000A28

A Novel Approach for Word Spotting Using Merge-Split Edit Distance

Auteurs : Khurram Khurshid [France] ; Claudie Faure [France] ; Nicole Vincent [France]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.

RBID : ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7

Abstract

Abstract: Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.

Url:

https://api.istex.fr/document/8C1F3989D2466FF4A187343DA0F0E8326A4176F7/fulltext/pdf

DOI: 10.1007/978-3-642-03767-2_26

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000D61
to stream Istex, to step Curation: 000D32
to stream Istex, to step Checkpoint: 000549
to stream Main, to step Merge: 000A35
to stream Main, to step Curation: 000A27

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Novel Approach for Word Spotting Using Merge-Split Edit Distance</title>
<author><name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
</author>
<author><name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
</author>
<author><name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-03767-2_26</idno>
<idno type="url">https://api.istex.fr/document/8C1F3989D2466FF4A187343DA0F0E8326A4176F7/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000D61</idno>
<idno type="wicri:Area/Istex/Curation">000D32</idno>
<idno type="wicri:Area/Istex/Checkpoint">000549</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Khurshid K:a:novel:approach</idno>
<idno type="wicri:Area/Main/Merge">000A35</idno>
<idno type="wicri:Area/Main/Curation">000A27</idno>
<idno type="wicri:Area/Main/Exploration">000A27</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A Novel Approach for Word Spotting Using Merge-Split Edit Distance</title>
<author><name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
<orgName type="university">Université Paris Descartes</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13</wicri:regionArea>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
<orgName type="university">Université Paris Descartes</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">8C1F3989D2466FF4A187343DA0F0E8326A4176F7</idno>
<idno type="DOI">10.1007/978-3-642-03767-2_26</idno>
<idno type="ChapterID">26</idno>
<idno type="ChapterID">Chap26</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Île-de-France</li>
</region>
<settlement><li>Paris</li>
</settlement>
<orgName><li>Université Paris Descartes</li>
</orgName>
</list>
<tree><country name="France"><region name="Île-de-France"><name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
</region>
<name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
<name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
<name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
<name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
<name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A27 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A27 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7
   |texte=   A Novel Approach for Word Spotting Using Merge-Split Edit Distance
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

A Novel Approach for Word Spotting Using Merge-Split Edit Distance

A Novel Approach for Word Spotting Using Merge-Split Edit Distance

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri